XeHE: An Intel GPU Accelerated Fully Homomorphic Encryption Library by Alexander Lyashevsky Alexey Titov Yiqin Qiu and Yujia Zhai
Author:Alexander Lyashevsky, Alexey Titov, Yiqin Qiu, and Yujia Zhai
Language: eng
Format: epub
Publisher: Alexander Lyashevsky, Alexey Titov, Yiqin Qiu, and Yujia Zhai
Published: 2023-04-04T18:55:39+00:00
Multi-tile Scaling
Intel packages multiple computing tiles on a single board for scalable performance (Blythe 2020). Due to underlying complexities, implicitly support the multi-tile submission at full performance cannot be counted upon on all platforms. As we showed, this was quite evident in our experiences. In general, applications do best when designed to spread work across multiple queues in ways that can easily be matched to the optimal ways to use a particular platform. This built-in flexibility in an application leads to more portable and performance portable code. In our case, knowing that the memory independent workloads will not be distributed over all tiles of a multi-tile Intel GPU automatically influenced us to adopt a more portable structure in our implementation. In order to maximize the utilization of multi-tile devices, XeHE maintains one queue for each tile and submit workloads to different queues. Listing 8 shows the implementation details of the libraryâs multi-queue SYCL context: it checks whether multi-tile is supported on current device via SYCL partition functions, creates in-order queues for each (sub-)device (tile), and attaches the queues to the corresponding (sub-)device.
XeHE library achieves explicit multi-tile scaling by submitting workloads to the multiple queues, utilizing all the sub-devices initialized at SYCL context creation. Workloads on different queues are assumed to be memory independent. The assumption is achievable by submitting independent HE computation graphs to different queues. That reflects real world applications where different clients always send independent computation requests. The assumption simplifies the memory management across multiple-tile device and supports a separate memory cache for each queue as mentioned in the section above. Also, exploiting the advantage of fast tile-to-tile shared memory, we can load the shared data, such as security parameter context, only on a specific tile at initialization and share it across the tiles at run-time. This will reduce initialization overhead and simplify the code structure without losing run-time performance.
Listing 8 DPC++ Context with multiple queue
class Context { bool igpu = true; std::vector<cl::sycl::queue> _queues; void generate_queue(bool select_gpu = true){ if (select_gpu) { sycl::device RootDevice = sycl::device( intel_gpu_selector()); std::vector<sycl::device> SubDevices; try { // check if sub devices (tile split) is supported on GPU device SubDevices = RootDevice.create_sub_devices <sycl::info::partition_property::partition_by_affinity_domain> (sycl::info::partition_affinity_domain::next_partitionable); } catch (...) { std::cout << "Sub devices are not supported\n"; // only use the root device SubDevices.push_back(RootDevice); } // create in-order queues and attach to sub-devices sycl::context C(SubDevices); for (auto &D : SubDevices) { sycl::queue q; q = sycl::queue(C, D, sycl::property::queue::in_order()); _queues.push_back(q); } } else { // create queue based on CPU device ... } } public: Context(bool select_gpu = true){ generate_queue(select_gpu); =} ... };
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Ajax | Assembly Language Programming |
Borland Delphi | C & C++ |
C# | CSS |
Compiler Design | Compilers |
DHTML | Debugging |
Delphi | Fortran |
Java | Lisp |
Perl | Prolog |
Python | RPG |
Ruby | Swift |
Visual Basic | XHTML |
XML | XSL |
Deep Learning with Python by François Chollet(12593)
Hello! Python by Anthony Briggs(9928)
OCA Java SE 8 Programmer I Certification Guide by Mala Gupta(9804)
The Mikado Method by Ola Ellnestam Daniel Brolund(9787)
Dependency Injection in .NET by Mark Seemann(9348)
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8310)
Test-Driven iOS Development with Swift 4 by Dominik Hauser(7773)
Grails in Action by Glen Smith Peter Ledbrook(7705)
The Well-Grounded Java Developer by Benjamin J. Evans Martijn Verburg(7568)
Becoming a Dynamics 365 Finance and Supply Chain Solution Architect by Brent Dawson(7192)
Microservices with Go by Alexander Shuiskov(6955)
Practical Design Patterns for Java Developers by Miroslav Wengner(6872)
Test Automation Engineering Handbook by Manikandan Sambamurthy(6816)
Secrets of the JavaScript Ninja by John Resig Bear Bibeault(6426)
Angular Projects - Third Edition by Aristeidis Bampakos(6235)
The Art of Crafting User Stories by The Art of Crafting User Stories(5750)
NetSuite for Consultants - Second Edition by Peter Ries(5676)
Demystifying Cryptography with OpenSSL 3.0 by Alexei Khlebnikov(5492)
Kotlin in Action by Dmitry Jemerov(5076)
